Tech Debt Cleanup and Docs Update #219

hannahhoward · 2021-09-21T22:33:19Z

Goals

Do a big cleanup pass on go-graphsync to make it easier to reason about and work with. Belatedly update docs for IPLD LinkSystem.

Implementation

THIS PR LOOKS LARGE BUT IT'S NOT THAT BIG. These are the main changes:

I moved converted some functions that were simply computations on request codes to be methods on a request code. This makes them more portable.
I have reorganized the RequestManager and ResponseManager so they aren't single giant files, and I've divided them up so it's clear what happens inside the internal thread and what happens outside. This makes it easier to reason about where you can change state and where you can't. I've also documented how they use the Actor Pattern in the architecture docs to manager concurrency. This is where a lot of the line changes are, but it's almost entirely moving code around.
The only substantive change is the AsyncLoader. Simply: I removed the internal go routine. All this thing does is manage some state, and interface between various classes. I think it makes way more sense to just wrap the state in a mutex and go forth. I Communicate by sharing data may not be the Go way but it's about 200 lines shorter, and way simpler to reason about. I did a decent amount of testing to make sure nothing broke.
I updated the README and archtecture docs for ipld.Linksystem

I can break this into multiple PRs if need be, but you can also review the individual commits if you like.

Clarify actor pattern and responsibilities across threads

refactor requestmanager to clearly designate responsibilities in actor pattern

remove the actor pattern from asyncloader, as its not needed. it works well as simply a locked data structure and is about 10x simpler

update references in architecture doc to reflect IPLD linksystem

willscott · 2021-09-21T23:43:11Z

requestmanager/asyncloader/asyncloader.go

+	al.stateLk.Lock()
+	defer al.stateLk.Unlock()


this PR adds this single mutex lock around the loader's state. Do we have reasonable confidence that we won't end up with lock contention over it?

The old code sent a message to a queue channel, and a single goroutine owned the state and processed those messages sequentially. So the amount of contention seems basically the same before and after.

One significant difference is the priority and ordering; with a channel, before you would be guaranteed first-come-first-serve. With a mutex, it's always a semi-random race to decide what goroutine gets to grab the mutex, so if incoming messages keep coming, you could have high latencies with older messages getting unlucky and not grabbing the mutex. It's not clear to me whether order matters here.

One last thought: sharing memory does give us more ability to be performant, if we want to. For example, using a RWMutex, or multiple mutexes, or sync/atomic. So in terms of contention, I think it can be better than a single goroutine with a message queue, if we care enough to optimize it.

Yea I feel quite confident about this being a better, more reliable strategy.

willscott · 2021-09-21T23:43:51Z

requestmanager/client.go

+// to them.
+type RequestManager struct {
+	ctx             context.Context
+	cancel          func()


context.CancelFunc?

currently it's just func() everywhere in the code and I should change it all at once. will do in seperate cleanup PR

willscott · 2021-09-21T23:46:17Z

requestmanager/client.go

+	for cancelMessageChannel != nil || incomingResponses != nil || incomingErrors != nil {
+		select {
+		case cancelMessageChannel <- &cancelRequestMessage{requestID, false, nil, nil}:
+			cancelMessageChannel = nil


should this close the channel as well? a comment might be useful since this pattern looks weird otherwise

this if you look a couple lines above, what we're doing here is using cancelMessageChannel = nil to send exactly once -- but the actual channel is the main message channel for the request manager.

I agree, this is overcomplicated. I think it's probably ok to do the channel send once before the loop, but for this PR I'd rather not change that functionality.

willscott · 2021-09-21T23:50:51Z

requestmanager/server.go

+	return remainingResponses
+}
+
+func (rm *RequestManager) updateLastResponses(responses []gsmsg.GraphSyncResponse) {


what this method (amongst others) does seems not fully reflected in the name. some comments on expected behavior of these methods would be useful

will add in cleanup PR

mvdan

I left two comments on one of the commits, since the whole diff was pretty large. I'm not sure how well GitHub will deal with that :)

mvdan · 2021-09-22T09:23:31Z

go.mod

@@ -1,6 +1,6 @@
 module github.com/ipfs/go-graphsync

-go 1.12
+go 1.13


maybe time to get this repo onboard with unified ci :) it would take care of bumping this too

yes agreed.

mvdan · 2021-09-22T09:28:39Z

requestmanager/asyncloader/asyncloader.go

+	al.stateLk.Lock()
+	defer al.stateLk.Unlock()


The old code sent a message to a queue channel, and a single goroutine owned the state and processed those messages sequentially. So the amount of contention seems basically the same before and after.

One significant difference is the priority and ordering; with a channel, before you would be guaranteed first-come-first-serve. With a mutex, it's always a semi-random race to decide what goroutine gets to grab the mutex, so if incoming messages keep coming, you could have high latencies with older messages getting unlucky and not grabbing the mutex. It's not clear to me whether order matters here.

One last thought: sharing memory does give us more ability to be performant, if we want to. For example, using a RWMutex, or multiple mutexes, or sync/atomic. So in terms of contention, I think it can be better than a single goroutine with a message queue, if we care enough to optimize it.

mvdan · 2021-09-22T09:29:30Z

requestmanager/asyncloader/asyncloader.go


+	stateLk           sync.Mutex


I'd explicitly document what fields this mutex protects, even if that's just "all fields below".

mvdan · 2021-09-22T09:31:57Z

requestmanager/client.go

+// Shutdown ends processing for the want manager.
+func (rm *RequestManager) Shutdown() {
+	rm.cancel()
+}


I'm a bit confused; if the user supplies a context when creating a RequestManager, could they not just cancel the context themselves to do a shutdown? Having RequestManager hold the cancel func and expose a Shutdown method feels unnecessary.

One reason we might have for exposing Shutdown methods is to also block until all goroutines have stopped and resources have been freed up. If we think we'll want that later on, then I think leaving these in with a TODO makes sense. Otherwise, I'd personally remove the Shutdown APIs.

Also, if we went with the "we'll eventually want to block until all request goroutines are stopped" route, then I think these methods should gain a context parameter, too. That way, I can say "graceful shutdown for 10s to stop accepting requests, and after that, kill all requests which aren't done yet". This is how net/http does it: https://pkg.go.dev/net/http#Server.Shutdown

I agree this is odd. This is the pattern of the repo. I'd rather revisit the shutdown behavior globally in a seperate ticket.

mvdan · 2021-09-22T09:34:44Z

requestmanager/asyncloader/asyncloader.go

+	defer al.stateLk.Unlock()
+	_, ok := al.alternateQueues[name]
+	if !ok {
+		return errors.New("unknown persistence option")


include the name? :)

mvdan · 2021-09-22T09:40:08Z

responsecode.go

@@ -77,3 +77,47 @@ var ResponseCodeToName = map[ResponseStatusCode]string{
 	RequestFailedContentNotFound: "RequestFailedContentNotFound",
 	RequestCancelled:             "RequestCancelled",


while you're refactoring, you could automate this table and String method with https://pkg.go.dev/golang.org/x/tools/cmd/stringer

omg where was this tool all my life! You should check over at go-fil-markets and go-data-transfer for some pretty awesome comedy here.

mvdan · 2021-09-22T09:40:29Z

responsecode.go

+	case RequestCancelled:
+		return RequestCancelledErr{}
+	default:
+		return fmt.Errorf("Unknown")


return fmt.Errorf("unknown response status code: %d", c)

hannahhoward · 2021-09-24T02:44:35Z

@willscott @mvdan

I've responded to your comments. Some of them related to unchanged code that has just been moved around. I agree generally with all the comments, but as you can see I've suggested deferring them to minimize this changeset.

I've documented these cleanup issues here:
#223
#222
#221
#220

If I can get your approvals on this PR I'd like to go ahead and merge.

hannahhoward added 7 commits September 21, 2021 12:40

refactor(graphsync): cleanup status code utilities

6a9d1e9

refactor(responsemanager): reorganize for clarity

808c7fa

Clarify actor pattern and responsibilities across threads

refactor(requestmanager): reorganize for clarity

55781d4

refactor requestmanager to clearly designate responsibilities in actor pattern

refactor(asyncloader): remove go routines

7d2d2a7

remove the actor pattern from asyncloader, as its not needed. it works well as simply a locked data structure and is about 10x simpler

docs(README): cleanup and update for LinkSystem branch

32997df

docs(architecture): update to explain actor pattern

3112f84

docs(architecture.md): update for link system

86c0646

update references in architecture doc to reflect IPLD linksystem

hannahhoward force-pushed the refactor/cleanup-complex-components branch from eb9a9d2 to 86c0646 Compare September 21, 2021 22:34

hannahhoward requested review from masih and mvdan September 21, 2021 22:38

willscott reviewed Sep 21, 2021

View reviewed changes

mvdan reviewed Sep 22, 2021

View reviewed changes

refactor(style): cleanup errors, remove unused field

19b11e4

willscott approved these changes Sep 24, 2021

View reviewed changes

hannahhoward merged commit 9c1504a into main Sep 24, 2021

This was referenced Oct 4, 2021

release: v1.11.0-rc1 filecoin-project/go-data-transfer#260

Merged

release: v1.11.0 filecoin-project/go-data-transfer#262

Merged

release: v1.13.0 filecoin-project/go-fil-markets#635

Merged

release: v0.10.0 #238

Merged

aschmahmann mentioned this pull request Dec 1, 2021

Release v0.11 ipfs/kubo#8343

Closed

80 tasks

mvdan deleted the refactor/cleanup-complex-components branch December 15, 2021 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tech Debt Cleanup and Docs Update #219

Tech Debt Cleanup and Docs Update #219

hannahhoward commented Sep 21, 2021

willscott Sep 21, 2021

mvdan Sep 22, 2021

hannahhoward Sep 24, 2021

willscott Sep 21, 2021

hannahhoward Sep 24, 2021

willscott Sep 21, 2021

hannahhoward Sep 24, 2021

willscott Sep 21, 2021

hannahhoward Sep 24, 2021

mvdan left a comment

mvdan Sep 22, 2021

hannahhoward Sep 24, 2021

mvdan Sep 22, 2021

mvdan Sep 22, 2021

hannahhoward Sep 24, 2021

mvdan Sep 22, 2021

mvdan Sep 22, 2021

hannahhoward Sep 24, 2021

mvdan Sep 22, 2021

hannahhoward Sep 24, 2021

mvdan Sep 22, 2021

hannahhoward Sep 24, 2021

mvdan Sep 22, 2021

hannahhoward Sep 24, 2021

hannahhoward commented Sep 24, 2021

		@@ -77,3 +77,47 @@ var ResponseCodeToName = map[ResponseStatusCode]string{
		RequestFailedContentNotFound: "RequestFailedContentNotFound",
		RequestCancelled: "RequestCancelled",

Tech Debt Cleanup and Docs Update #219

Tech Debt Cleanup and Docs Update #219

Conversation

hannahhoward commented Sep 21, 2021

Goals

Implementation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvdan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hannahhoward commented Sep 24, 2021